13 - 25.7. Regression and Classification with Linear Models (Part 3) [ID:30381]
50 von 130 angezeigt

Okay, so we're switching gears here.

We've done linear regression, which is finding a line that kind of models the behavior of

a point set.

What we want to do now is we want to do linear classification, and the idea is very, very

simple that if you have a couple of points, some of them are good, some of them are bad,

you want to be able to, as in this case, draw a line between them that separates them.

Any line actually gives us, separates, in this case, two space into two parts, the good

and the bad, and that makes predictions.

In this case, in this example, classic example, where you want to distinguish earthquakes

from underground nuclear explosions, and you basically, you look at the, these kind of

waves your seismometers are actually registering, and there's two kinds of waves, one is called

the surface wave, and the other one is called the body wave, and how big they are apparently

distinguishes, is different for explosions and for earthquakes, and in this case, you

want to have a separator here.

We'll call a set of examples linearly separable if there is such a separating line, or hyperplane,

and inseparable if there's no linear separator.

Okay, and again, we have two real values that give us the separator, or in this case, the

same thing.

We can classify the examples by being below the separator, or above the separator, so

if we have the separator given by two real numbers, then that x1, x2, given by these

weights is bigger than zero, gives us the positive, the negative examples in this case,

and lower than zero being the positive examples.

Essentially, what you want to do is you want to have something where you're taking this

space and transforming it into a space where the separator is actually the real line.

That's what this does here, and again, we can do exactly the same thing.

If we introduce a dummy coordinate x0 equals one, then we can write the whole thing as

dot products again, and that makes the whole thing slightly simpler.

Okay, so if you think about solving these, then the realization here is that you can

think of this as a threshold function, greater or smaller than zero is what you have.

If you want to minimize this thing here, you are minimizing a function here that t of this

W times x, dot product x, and it's essentially a step function.

This step function that we're minimizing has the problem that we're losing differentiability.

Essentially, the threshold function looks like this, derivative is zero here, derivative

is undefined here, and so on.

Don't look for any closed form solutions with high school methods.

It doesn't work.

What still does work is gradient descent.

This is actually an Arabic letter, curly Arabic one.

Really, calligraphic t is really what it is.

Should it be... Indeed, this... Come on, work.

This should be a one, yes.

That's the idea, thanks.

But still, even before this... No, actually, yeah.

We can't use any closed form solutions, but we can use the following update rule here

in gradient descent, which looks exactly like it was before.

That actually works.

If you think about it, we really have three possibilities here.

If y is the same as the hypothesis, we've correctly classified the example, then this

term here is zero, then we do nothing.

If y is one, and we've classified it as zero, then we want to make these w times x here

Teil eines Kapitels:
Chapter 25. Learning from Observations

Zugänglich über

Offener Zugang

Dauer

00:18:04 Min

Aufnahmedatum

2021-03-30

Hochgeladen am

2021-03-30 16:57:56

Sprache

en-US

Linear Classifiers with a hard Threshold, their learning curves and Logistic Regression are discussed. 

Einbetten
Wordpress FAU Plugin
iFrame
Teilen